class: center, middle, inverse, title-slide .title[ # Statistical Concepts Everyone Should Know ] .subtitle[ ##
Statistics for Life
] .author[ ###
John Slough
for
The John & Calvin Podcast
] --- ## 1. Mean vs Median
--- ## 1. Mean vs Median
--- ## 1. Mean vs Median
--- ## 1. Mean vs Median
--- ## Mean vs Median as Skew (σ) Rises
--- ## 1. Applications of Mean vs Median - **Wealth and Income** *"Average net worth increased"* — but mostly billionaires; median person saw little change. - **Real Estate Prices** *"Average home price is \$800,000"* — driven by a few luxury sales; median much lower. - **Medical Costs** *"Average hospital bill is \$25,000"* — rare extreme cases inflate the mean. - **Life Expectancy** *"Average lifespan is 40 years"* — high infant mortality lowers the mean dramatically. - **Response Times for Emergency Services** *"Average ambulance arrival time is 8 minutes"* — median is often much shorter. - **GDP per Capita** *"GDP per person is \$60,000"* — but the median citizen earns far less in many countries. - **Asteroid Impact Energy** *Mean impact = “city-killer”* — dominated by once-in-a-million-years giants; median impact is a harmless fireball. - **Software Bug Fix Times** *"Average bug fix time is 15 days"* — most fixes happen quickly; a few linger for months. --- ## Mean or Median? — It Depends on the Shape of Your Data - **Symmetric (bell-shaped) distributions** *Mean ≈ Median* → either works <span style="font-size:0.9em;color:gray;">(heights, random errors, test scores in large classes)</span> - **Skewed distributions (long tail on one side)** **Use the Median** – it ignores extreme outliers <span style="font-size:0.9em;color:gray;">(wealth, medical bills, wait times)</span> - **Heavy-tailed or “black-swan” data** Median or percentiles are safest; mean can explode <span style="font-size:0.9em;color:gray;">(earthquake energy, insurance losses)</span> - **Bimodal / multi-cluster data** No single centre makes sense → show the full distribution or the two modes <span style="font-size:0.9em;color:gray;">(commuter travel times, exam grades with pass/fail peaks)</span> .small[ *Rule of thumb ▶ If a handful of extreme observations could swing the result, report the **median** (and a spread measure) instead of the mean.* ] --- ### From “What’s the Average?” → “What’s the Distribution?” - **Why the mean mis-led us:** the income data are **skewed**, so a few extreme values dragged the mean away from the typical person (≈ median). - **What really matters is the full shape** of the data—its **distribution**. - **Normal distribution (bell curve)** - Symmetric, mean = median = mode. - Central Limit Theorem ⇒ sample means often look normal. - **But many real-world data aren’t normal:** - Incomes/wealth → **log-normal / Pareto** (long right tail) - Failure times → **Weibull / Exponential** (many early, few late) - Web traffic, social-media posts, bug lifetimes … often heavy-tailed .small[ **Takeaway ▶** Before quoting a single “average,” look at the distribution. If it’s skewed or heavy-tailed, report the **median or percentiles** and pick the distribution that actually fits the data. ] --- ## Bias **Bias** is the difference between the expected value of an estimator and the true value of the parameter it estimates. Formally: $$ \text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta $$ Where: - `\(\hat{\theta}\)` = the **estimator** (your calculated estimate) - `\(\theta\)` = the **true parameter** (the real value you want) - `\(\mathbb{E}[\hat{\theta}]\)` = the **expected value** of the estimator (its long-run average over many samples) - If Bias = 0 → the estimator is **unbiased**. - If Bias ≠ 0 → the estimator is **biased** (systematically too high or too low). --- Bias doesn’t just shift your first guess. It filters what you see next — making you even more biased over time Selection bias: The sample is not representative of the population. Omitted variable bias: Leaving out a variable that influences both the dependent and independent variables. Measurement bias (or information bias): Errors in how data is collected or recorded. Survivorship bias: Only analyzing "survivors" or those who remain, ignoring those who dropped out or failed. Recall bias: Errors because people remember things inaccurately (common in surveys and retrospective studies). Observer bias: Researcher's expectations subtly influence measurements or observations. Publication bias: Studies with "positive" results are more likely to be published than "null" or "negative" results. --- ## Bias ### The Self-Reinforcing Feedback Loop of Bias
--- ### Consequence of the Self-Reinforcing Feedback Loop of Bias Conceptual Edition
--- ### Consequence of the Self-Reinforcing Feedback Loop of Bias Carnivore Edition
--- ### Consequence of the Self-Reinforcing Feedback Loop of Bias Vegan Edition